CapsEnhancer: An Effective Computational Framework for Identifying Enhancers Based on Chaos Game Representation and Capsule Network

Enhancers are a class of noncoding DNA, serving as crucial regulatory elements in governing gene expression by binding to transcription factors. The identification of enhancers holds paramount importance in the field of biology. However, traditional experimental methods for enhancer identification demand substantial human and material resources. Consequently, there is a growing interest in employing computational methods for enhancer prediction. In this study, we propose a two-stage framework based on deep learning, termed CapsEnhancer, for the identification of enhancers and their strengths. CapsEnhancer utilizes chaos game representation to encode DNA sequences into unique images and employs a capsule network to extract local and global features from sequence “images”. Experimental results demonstrate that CapsEnhancer achieves state-of-the-art performance in both stages. In the first and second stages, the accuracy surpasses the previous best methods by 8 and 3.5%, reaching accuracies of 94.5 and 95%, respectively. Notably, this study represents the pioneering application of computer vision methods to enhancer identification tasks. Our work not only contributes novel insights to enhancer identification but also provides a fresh perspective for other biological sequence analysis tasks.


■ INTRODUCTION
Enhancers, short noncoding DNA sequences interspersed throughout the genome, play an indispensable role in the orchestration of gene expression and, by extension, every biological process in living organisms. 1−3 These unique genomic elements are known to amplify the transcription rate of their associated genes, acting as regulators in the vast genomic machinery.Enhancers facilitate the binding of proteins, such as transcription factors and coactivators, which modulate transcription initiation, thereby influencing many cellular activities such as differentiation, development, and responses to environmental stimuli. 4Notably, enhancers can function from variable distances away from the genes they regulate and can be located upstream, downstream, or even within intronic regions of these genes.−7 The functional importance of enhancers extends beyond merely amplifying gene expression; they are critical in determining the spatiotemporal patterns of gene activity, thereby shaping the identity and function of each cell type.In essence, enhancers are at the heart of cellular diversity and organismal complexity.Their malfunctioning is associated with various genetic disorders, including cancer, highlighting their importance in maintaining cellular homeostasis. 8Traditional enhancer identification methods, 9−11 such as ChIP-seq, although capable of identifying enhancers, also face challenges, including high costs, low throughput, and extensive starting material requirements.These challenges make them timeconsuming, labor-intensive, and expensive.Moreover, the vastness and complexity of the human genome make the largescale application of these experimental methods impractical. 12s a result, the scientific community is increasingly utilizing computational methodologies to identify and classify enhancers.The advent of next-generation sequencing technologies has led to the development of numerous computational strategies aimed at distinguishing enhancers from other noncoding genomic regions, offering an economical and efficient alternative to traditional experimental approaches. 13omputational methods are generally classified into two categories: based on traditional machine learning classifiers and deep learning.Support vector machine (SVM) and random forest (RF) algorithms are frequently employed for enhancer classification.Liu et al. proposed a method called iEnhancer-2L, integrating the pseudo k-tuple nucleotide composition (PseKNC) of DNA sequences and utilizing SVM for enhancer identification. 14Jia et al. introduced a tool named Enhan-cerPred, integrating biprofile bayes (BPB), nucleotide composition (NC), and PseNC, constructing a classifier using SVM. 15im et al. developed a RF-based tool called iEnhancer-RF, integrating six kinds of features of DNA sequences. 16Methods based on ensemble learning are also employed for the recognition of enhancers.In 2018, Liu et al. proposed an ensemble learning method named iEnhancer-EL, utilizing PseKNC, Kmer, and subsequence profile of sequences to predict enhancers. 17Similarly, Wang et al. developed Enhancer-FRL, integrating ten kinds of features and employing five machine learning methods, including SVM and RF, to predict enhancers and their activities. 18Gill et al. developed a deep forest-based tool, NEPERS, integrating four kinds of features to identify enhancers. 19n recent years, in addition to traditional machine learning algorithms, due to the development of deep learning, an increasing number of enhancer classifiers based on deep learning have emerged.Nguyen et al. developed iEnhancer-ECNN, utilizing one-hot encoding of sequences and convolutional neural network (CNN) to predict enhancers. 20Le et al. introduced BERT-Enhancer, utilizing a BERT pretraining model to extract sequence encoding, followed by CNN to build a classifier. 21Niu et al. developed a tool named iEnhancer-EBLSTM, using Kmers information from DNA sequences and employing bidirectional LSTM (BiLSTM) to construct an enhancer classifier. 22A summary of relevant work is presented in Table 1.
While the computational methodologies mentioned above exhibit promising outcomes, with each demonstrating distinct merits, additional investigation is warranted for the following reasons.−32 For example, Li et al. proposed a method called GCR-Net, which models Kmers of genomic sequences hierarchically to enhance the prediction of translation initiation sites. 33Nevertheless, extant models, including CNNs, exhibit suboptimal performance in effectively learning the intricate associations between Kmers and their respective frequencies.An efficacious method for sequence encoding, chaos game representation (CGR), has the capability to transform biosequences into two-dimensional images. 34By encoding Kmers frequencies as visual representations, employing computer vision methods to discern and acquire patterns inherent in CGR-encoded images is straightforward. 35Moreover, although CNNs have achieved a series of successes in recent years, they are not without limitations.For instance, CNNs lack an understanding of the hierarchical structure of objects.Traditional CNNs struggle to capture the hierarchical structure and part-whole relationships of objects, leading to limitations in comprehending the relationship between the overall and local features of objects. 36,37In recent years, researchers have been continuously developing novel methods to address the aforementioned limitations and applying them to various tasks.Guo et al. proposed a variational gated autoencoder-based feature extraction model to extract complex contextual features and infer disease-miRNA associations. 38dditionally, a method called MCANet, which integrates multiscale convolution and self-attention mechanisms, adaptively reveals spatial-temporal contextual dependence to enhance Poly(A) signal prediction. 39 modeling of complementary features using attention mechanisms. 40o further address constraints of CNNs, a new generation of neural networks, known as capsule networks, has emerged. 41apsule networks introduce the concept of capsules to better capture the spatial hierarchical structure within objects.Each capsule represents a specific entity or part, and the relationships between capsules can be modeled.This contributes to an enhanced understanding of object hierarchical structures by the network.−45 In this study, we proposed a new scheme named CapsEnhancer, designed to achieve the identification of enhancers and their strength.The workflow of CapsEnhancer is shown in Figure 1.Experimental results demonstrate that CapsEnhancer achieves satisfactory performance on benchmark data sets.The main contributions of this study can be summarized as follows.
(1) We designed a two-stage computational framework called CapsEnhancer to identify enhancers and their strengths.The first stage of CapsEnhancer focuses on enhancer recognition, distinguishing between enhancer and nonenhancer.The second stage involves predicting enhancer strength, specifically discerning between strong and weak enhancers.In comparison to previous methods, CapsEnhancer exhibits significant improvements, achieving an 8% increase in accuracy during the first stage and a 3.5% improvement in the second stage.Beyond providing a robust solution for enhancer identification, our framework introduces a novel perspective for other biological sequence analysis tasks.

■ MATERIALS AND METHODS
Benchmark Data Set.In order to facilitate fair comparisons, we employed the data set constructed by Liu et al., 14,17 which  has been widely used in enhancer prediction tasks. 15,16,18,20,29he enhancers within this data set were derived from nine distinct cell lines, wherein they were isolated as DNA sequences from short 200 bp clips of uniform length.Subsequently, the CD-HIT software was employed to eliminate paired sequences exhibiting a similarity surpassing 20%.
The final data set can be represented as follows To facilitate a more comprehensive understanding of the distinctions between positive and negative samples, the GC content of data sets for two distinct stages was plotted, as presented in Figure S1.It is evident that enhancers exhibit a higher GC content compared to nonenhancers.Furthermore, strong enhancers also display a higher GC content when contrasted with weak enhancers.
Architecture Overview of CapsEnhancer.The model architecture of CapsEnhancer is illustrated in Figure 2. CapsEnhancer is a two-stage framework wherein the first stage aims to identify enhancer and nonenhancer, while the second stage focuses on distinguishing between strong enhancer and weak enhancer categories.Initially, DNA sequences are transformed into images through CGR encoding.Subsequently, a 2-dimensional convolutional neural network (Conv2D) is employed to preliminarily extract features from these images.The acquired preliminary features are then fed into a capsule network for further feature extraction and spatial modeling from the images.The main notations of this study are summarized in Table 2.
CGR Encoding.CGR is a mathematical method that employs iterated function systems to convert sequential data into a fractal depiction within a two-dimensional space.CGR is a milestone in graphical bioinformatics and is considered a powerful tool for feature encoding in biological sequences, including DNA, RNA, and protein sequences. 35,46e employed CGR encoding to encode DNA sequences in this study.Initially, allocate the four nucleotides (A, C, G, T) to the four vertices of a square.Figure 3A provides an illustrative example of encoding a sequence using CGR representation.For a DNA sequence s of length n, where s = s 1 , ..., s i , ..., s n and s i ∈ {A, C, G, T}, the coordinates of the new nucleotide s i in the sequence are determined by the current amino acid type and the coordinates of the preceding nucleotide s i−1 .The position of s i is located halfway along the line connecting the current position and the vertex associated with the nucleotide.CGR encoding of sequence s is a two-dimensional representation of ordered pairs (x 1 , y 1 ) through (x i , y i ) to (x n , y n ), where (x i , y i ) is defined as follows where (x 0 , y 0 ) = (0, 0) and Figure 3B illustrates the partitioning of the CGR space during the iterative process.Each subsquare within the CGR space holds distinctive significance.Upon dividing the CGR into four quadrants, the upper right corner encompasses points that symbolize subsequences terminating with the nucleotide T.This is attributed to the fact that the midpoint between any other point within the square and the corner T invariably resides within this quadrant.Upon subdividing this quadrant into four squares in a clockwise order, they, respectively, denote subsequences concluding with TT, GT, CT, and AT.This configuration facilitates the computation of 2-mer counts by tallying the points within these designated subsquares.
In contrast to the precise coordinate representation employed by the original CGR, a discretization method known as the frequency chaos game representation (FCGR) has been introduced to provide a coarser and less susceptible-to-noise abstraction for sequences.FCGR, an extension of CGR, involves a grid-based counting approach for determining the points within the CGR.The initial step of FCGR involves partitioning the CGR image into N × N regions.Subsequently, the point count within each region serves as the region's frequency, enabling the compression of the CGR and resulting in an FCGR matrix with dimensions N × N applicable to input sequences of varying lengths.Therefore, the predefined grid values can serve as a representation of the frequency of Kmers.In this study, we opt for N = 64 as the parameter for generating × 64 64 images corresponding to each DNA sequence.
Figure S2A,B, respectively, depict the FCGR images of an enhancer sequence and a nonenhancer sequence.Their FCGR images exhibit markedly distinct patterns, including highlighted regions in red and blue, which could potentially serve as   41 Within the primary capsule layer resides a Conv2D layer, employed for further feature extraction.The outputs from this Conv2D layer are transformed into multiple m-dimensional vectors (the dimensionality m being a hyperparameter).These m-dimensional vectors undergo a nonlinear "squash" function that retains the direction of the vector while constraining its magnitude to a range between 0 and 1.
For the binary classification task, the type capsule layer encompasses two n-dimensional capsules: one positive capsule and one negative capsule.The length of each capsule in type capsule layer represents the probability of being predicted as a positive (or negative) sample.Figure 4 illustrates the computational process between the primary and type capsule layers.
To derive the prediction vectors from capsule i to j, the outputs of the primary capsule layer u i are initially multiplied by a learnable weight matrix W i,j .Subsequently, S j is determined as the weighted sum of all computed | u j i .
where i and j, respectively, denote two capsules originating from the primary capsule layer and the type capsule layer, and L is the number of primary capsules.Here c i,j represents coupling coefficients, determined by the dynamic routing algorithm (see Algorithm S1 in Supporting Information), indicating the degree of coupling between the primary capsule i and the type capsule j. S j is fed into the Squash function to produce an output vector V j with a length between 0 and 1.
Based on the theory of capsule networks, the vector V j is utilized to model positive and negative samples, specifically (enhancer versus nonenhancer or strong enhancer versus weak enhancer) in this task.Each element of V j represents a feature of positive or negative samples, and the length of V j signifies the probability of being predicted as a positive or negative sample.Hence, to derive the predicted probabilities, it is necessary, at the network's terminus, to compute the length of V j , as delineated by the following formula.
where p j, respectively, denote the model's predictions for being in the positive class or negative class.Performance Assessment.In this study, we used accuracy, sensitivity, specificity, and the Matthews correlation coefficient (MCC) as evaluation metrics for the two-stage task.Their definitions are as follows.
where TP, TN, FP and FN represent the number of true positives, true negatives, false positives and false negatives, respectively.
The CGR encoding was implemented using the R package "Kaos". 47The model was trained for 100 epochs to ensure adequate fitting, employing the Adam optimizer 48 with an initial learning rate set to 0.1.Hyperparameter tuning was conducted using grid search and cross-validation techniques, with the specific search space outlined in Table S1.We iterate over all possible combinations of specified hyperparameter values and evaluate each combination using fivefold cross-validation to identify the best-performing hyperparameter combination.The pipeline for CapsEnhancer was established using PyTorch, 49 and the training process utilized 4 × Nvidia 2080 Ti GPUs.

■ RESULTS AND DISCUSSION
Performance Comparison with Existing Methods.First Stage: Enhancer Versus Nonenhancer.The capsule networkbased with CGR encoding learning scheme was employed to adapt the task of standard binary classification between enhancer and nonenhancer sequences.We conducted a fair comparison with 13 currently existing tools, reporting performance on an independent test set as shown in Table 3. From Table 3, it is evident that CapsEnhancer achieves state-of-the-art performance compared to existing methods, exhibiting a substantial improvement in terms of accuracy, sensitivity, specificity, MCC, and AUC.In comparison to the second-ranked NEPERS method, CapsEnhancer exhibits an 8% improvement in ACC, reaching an accuracy of 94.5%, indicative of its precision in enhancer prediction.In terms of MCC, CapsEnhancer outperforms the second-ranked method by 0.16, reaching a value of 0.89.Furthermore, CapsEnhancer excels in sensitivity and specificity, surpassing the second-ranked method by 6 and 10%, reaching 93 and 96%, respectively.Notably, both sensitivity and specificity for CapsEnhancer exceed 90%, indicating its ability to provide more balanced predictions.
Moreover, a significant challenge for two-stage tasks is how to handle false positive samples from the first stage.These false positives will also undergo the prediction task in the second stage, thereby affecting the robustness of the model.As shown in Table 3, it is evident that CapsEnhancer has a very low false positive rate (1�specificity) of only 4%, which is more than a 10% reduction compared to other existing methods.This demonstrates that CapsEnhancer is more robust than other methods and better at avoiding false positives.Second Stage: Strong Enhancer Versus Weak Enhancer.The second stage of the CapsEnhancer involves the task of predicting enhancer strength, specifically distinguishing between strong enhancers and weak enhancers.We compared the performance of the second stage with existing methods, and the results are presented in Table 4. CapsEnhancer continues to exhibit impeccable predictive performance in the second stage, showcasing a significant lead in metrics such as accuracy, sensitivity, specificity, MCC, and AUC.In terms of accuracy, CapsEnhancer outperforms the second-ranked iEnhancer-DCSA by 3.5%, achieving a remarkable accuracy of 95%.Furthermore, in terms of MCC, it surpasses the second position by 0.06, reaching a value of 0.903.CapsEnhancer demonstrates satisfactory performance in sensitivity and specificity, achieving 99 and 91%, respectively.
The results above demonstrate that CapsEnhancer has achieved outstanding performance in both stages of the task, which can be attributed to several factors.First, the use of CGR encoding serves as an efficient method for converting DNA sequences into two-dimensional images, enabling the application of computer vision techniques to sequence-related problems.Importantly, CGR encoding excels in capturing the frequency of Kmers. 35Prior literature has emphasized the significance of Kmers frequency as a critical feature in DNA sequence analysis. 46,50,51Second, owing to the architecture of the capsule network, the introduction of the capsule concept allows for effective spatial modeling of input images.Capsule networks overcome traditional CNN limitations, such as the inability to comprehend spatial relationships between features and the loss of invariance due to pooling operations.Consequently, in this context, capsule networks successfully learn the relationships between Kmers.Furthermore, owing to the aforementioned advantages of CapsEnhancer, we conducted a case study to illustrate its efficacy in managing sequencing errors and its capability to extend effectively to sequences of nonuniform lengths.Detailed information can be found in the case study section in Supporting Information.In conclusion, the synergistic combination of CGR encoding and capsule networks constitutes a pivotal factor in improving the performance of enhancer prediction tasks.
Effectiveness of the Capsule Network Architecture.The interaction between the primary capsule layer and the type capsule layer is at the core of the entire capsule network architecture.To visually demonstrate the superiority of the capsule network architecture, we extracted features from the samples in train sets at both the primary capsule layer and the class capsule layer in two distinct stages.Subsequently, utilizing the PCA dimensionality reduction technique, we reduced the extracted features to 2 dimensions, corresponding to the scatter plots shown in Figure 5. Red and blue points represent positive and negative samples, respectively (stage 1: enhancer and nonenhancer; stage 2: strong enhancer and weak enhancer).
From Figure 5A,C, it can be observed that the points representing positive and negative samples are entangled, exhibiting similar distributions, making it challenging to distinguish between them.However, after undergoing the capsule network architecture, the red and blue points manifest clearly distinct distributions, facilitating easy differentiation.This indicates that the computational processes in the primary capsule layer and class capsule layer further refine the features.The processed features enable positive and negative samples to exhibit disparate distributions, thereby enhancing the predictive capabilities of the model.This improvement is attributed to the dynamic routing algorithm of the capsule network, which allows information propagation and weight adjustments between different capsules.This dynamic routing mechanism is crucial for the capsule network's ability to model spatial relationships among different features.
Ablation Experiment.Subsequently, we conducted ablation experiments to further validate the significance of the capsule network.We replaced the capsule network with a multilayer perceptron and performed experiments at both stages.The experimental results are presented in Table 5.It is evident that upon removing the capsule network, the model's performance significantly deteriorated at both stages.In the first stage, the accuracy dropped by 17%, reaching only 77.3% compared to CapsEnhancer.In the second stage, the accuracy was lower by 9%, reaching only 86% compared to CapsEnhancer.In terms of MCC, the absence of the capsule network architecture resulted in a decrease of 0.345 and 0.161 at the two stages, achieving MCC values of 0.545 and 0.742, respectively.
Furthermore, in the second stage of the model, without the capsule network, the sensitivity and specificity were 98 and 74%, respectively.This indicates that in the absence of the capsule network, the model not only fails to achieve precise predictions but also lacks the ability to achieve a balanced prediction.

Journal of Chemical Information and Modeling
In order to visually demonstrate the performance of the ablation experiments, we plotted the receiver operating characteristic (ROC) curves for CapsEnhancer and models without CapsNst, in two stages as illustrated in Figure 6.As evident from Figure 6, whether in stage 1 or stage 2, the ROC curve corresponding to CapsEnhancer consistently resides outside that of the model without CapsNst, achieving a higher AUC.In addition, we also plotted the precision-recall (PR) curves for both stages, as depicted in Figure S3.Similar to ROC curves, CapsEnhancer achieved a higher area under the PR compared to models without CapsNet.This further underscores the significance of the capsule network architecture in improving the predictive capabilities for the enhancer task.
Feature Analysis.Within the domain of deep learning, discriminative features play a pivotal role in the development of robust classifiers.In contrast to existing methods, CapsEnhancer exhibits dual principal advantages: first, it leverages CGR encoding for the representation of DNA sequences, and second, it effectively learns from CGR images through the deployment of capsule network architecture.Consider the first stage, where the type capsule layer encompasses two capsules, each constituting a 32-dimensional vector corresponding to enhancer or nonenhancer categories.This configuration facilitates the construction of distinct features associated with enhancer and nonenhancer attributes.To underscore the discriminative efficacy of features extracted by CapsEnhancer concerning enhancers, 30 enhancers and 30 nonenhancers were randomly selected from the test set for comprehensive feature clustering analysis on the corresponding DNA sequences.In this analysis, we used hierarchical clustering with the complete linkage method.The resultant clustering patterns, illustrated in Figure 7, reveal two key observations: enhancers and nonenhancers distinctly cluster into separate subtrees, and DNA sequences of the same classification often exhibit analogous feature patterns.These findings provide compelling evidence that the features derived from the proposed CapsEnhancer method adeptly encapsulate traits pertinent to enhancers, offering further justification for the method's effectiveness.

■ CONCLUSIONS
Enhancers are a type of noncoding DNA element that can regulate gene expression.The identification of enhancers is crucial in the field of biology.First, it provides insights into the complex networks of gene regulation that govern various biological processes, such as development, differentiation, and response to environmental stimuli.By pinpointing enhancers associated with specific genes, researchers can unravel the molecular mechanisms underlying normal cellular functions and pathological conditions.Furthermore, the identification of enhancers has significant implications in the context of human health.Dysregulation of gene expression, often influenced by aberrant enhancer activity, is implicated in numerous diseases, including cancers and developmental disorders.Unraveling enhancer landscapes helps researchers identify potential therapeutic targets and develop strategies for precise intervention in gene expression patterns.
Traditional experimental methods, while effective in identifying enhancers, often demand substantial human and financial resources.In recent years, there has been an increasing emphasis on employing computational approaches for enhancer identification, driven by the rapid advancements in artificial intelligence methods.In this study, we propose a two-stage framework, CapsEnhancer, based on deep learning to efficiently predict enhancers and their strengths.The first stage focuses on identifying enhancers, while the second stage aims to predict strong and weak enhancers.Initially, we employ CGR encoding to represent each DNA sequence as an image, enabling the efficient representation of Kmers and frequencies.Furthermore, we utilize a capsule network-based architecture to extract local and global features of the images, overcoming the limitations of traditional CNNs and providing spatial modeling for features of these images.Experimental results demonstrate the outstanding predictive capabilities of our method in both stages, achieving state-of-the-art performance.This study employs computer vision methods to handle sequence data, and we believe that our research not only offers novel insights into enhancer identification but also provides a fresh perspective for other biological sequence analysis tasks.

■ KEY POINTS
• We proposed a two-stage framework, CapsEnhancer, based on deep learning, for accurate prediction of enhancers and their strength.• CapsEnhancer employs CGR encoding to represent each DNA sequence as an image.Through this encoding methodology, it enables effective representation of Kmers and their frequencies.• CapsEnhancer utilizes an architecture based on capsule networks to learn both local and global features from DNA "images".Capsule networks overcome the limitations of traditional CNNs by capturing spatial relationships among features in DNA "images", thereby enhancing the model's performance.• The framework proposed in our study employs computer vision strategies to process biosequence data, complemented by the integration of a next-generation neural network, the capsule network.This presents a novel approach and perspective for tasks of biosequence data analysis.
■ ASSOCIATED CONTENT

( 2 )
CapsEnhancer uses CGR encoding to represent each DNA sequence as an image.Through this encoding method, it can effectively represent Kmers and their frequencies.(3) CapsEnhancer employs a capsule network-based architecture to learn local and global features from the "images" transformed from DNA sequences.CapsEnhancer represents the pioneering adoption of computer vision strategies for enhancer identification.(4) Experimental results demonstrate that CapsEnhancer attains state-of-the-art performance in the two-stage task.

Figure 1 .
Figure1.Workflow of CapsEnhancer.First, we utilized benchmark data sets from previous studies.Subsequently, each DNA sequence was encoded using CGR encoding and represented as corresponding two-dimensional images.The model was then constructed using an architecture based on capsule networks.Hyperparameter adjustment was performed through fivefold cross-validation, and the model was evaluated using an independent test set, with the subsequent reporting of model performance metrics.The trained model was ultimately employed for enhancer identification, constituting a two-stage task.The first stage focused on discerning enhancers from nonenhancers, while the second stage aimed to predict enhancer strength, i.e., strong enhancers versus weak enhancers.The second stage employs the same FCGR images as the first stage to maintain consistency in the input representation.Capsule networks are used in both stages to build the models.

Figure 2 .
Figure 2. Architecture of CapsEnhancer.First, DNA sequences are encoded using CGR encoding and represented as 2D images.Subsequently, they are input into a Conv2D for preliminary feature extraction.Following this, the data is fed into a capsule network, which consists of a primary capsule layer and a type capsule layer.The primary capsule layer includes a Conv2D for further extracting local features.Then, a dynamic routing algorithm is utilized to capture the spatial relationships of features, resulting in the type capsule layer.As the task is a standard binary classification, the type capsule layer comprises two capsules, corresponding to the positive class and the negative class (stage 1: enhancer versus nonenhancer; stage 2: strong enhancer versus weak enhancer).Finally, the prediction probabilities for the two classes are obtained by calculating the lengths of the capsules in the type capsule layer.
subset + comprises 1484 enhancer samples, and comprises 1484 nonenhancer samples, forming the first stage of the data set.In addition, + Strong consists of 742 strong enhancer samples, and + Weak comprises 742 weak enhancer samples, constituting the second stage of the data set.The independent test set is utilized for assessing the model's performance, and it is sourced from the work of Liu et al., encompassing 100 strong enhancers, 100 weak enhancers, and 200 nonenhancers.

Figure 3 .
Figure 3. (A) Applying CGR encoding to an example sequence: CATG.(B) Dividing the CGR space during the iterative process.

Figure 4 .
Figure 4. Computational process between primary capsules and type capsules.

Figure 5 .
Figure 5. Visualization of positive and negative samples of train set in primary capsule and type capsule layers of CapsEnhancer in two stages.(A) Primary capsule layer of stage 1. (B) Type capsule layer of stage 1. (C) Primary capsule layer of stage 2. (D) Type capsule layer of stage 2.

Figure 6 .
Figure 6.ROC curves for CapsEnhancer and the model without capsule network in (A) stage 1 and (B) stage 2.

Figure 7 .
Figure 7. Clustering analysis map of latent features generated by CapsEnhancer on the independent test set in stage 1.

Table 1 .
Wang et al.introduced a cross-feature enhancement module, which effectively reduces information redundancy and facilitates the integration and Summary of Existing Tools for Enhancer Identification

Table 2 .
Main Notations and Descriptions

Table 3 .
Performance Comparison with Other Existing Methods on the Independent Test Set of the First Stage: Enhancer Versus Non-Enhancer

Table 4 .
Performance Comparison with Other Existing Methods on the Independent Test Set of the Second Stage: Strong Enhancer Versus Weak Enhancer

Table 5 .
Performance Comparison with Other Existing Methods on the Independent Test Set of the Second Stage: Strong Enhancer Versus Weak Enhancer